Goto

Collaborating Authors

 coordinate frame


ARoto translation invariance

Neural Information Processing Systems

A.1 Rotations in 2 dimensions In 2-dimensional settings, there exists a single scalar angular position, the yaw angle ฮธ. In order to perform the transformation, we have to express the angular positions in a format suitable for linear transformations; we do so by transforming them to rotation matrices, perform a matrix multiplication, and then transform the angular positions back to angle format. In 2 dimensions, we use eq. After the rotation, we can convert them back to angle format using the 2-argument arc-tangent function: ฮธ = atan2(sinฮธ,cosฮธ) (14) Simplified rotations In 2 dimensions, the computations can be simplified since rotations commute. First, we show that chained rotations result in angle addition/subtraction, that is: Q(ฮธi) Q(ฮธj) = cosฮธi sinฮธi sinฮธicosฮธi cosฮธj sinฮธj sinฮธjcosฮธj (15) = cosฮธicosฮธj sinฮธisinฮธj cosฮธisinฮธj sinฮธicosฮธj sinฮธicosฮธj +cosฮธisinฮธj sinฮธisinฮธj +cosฮธicosฮธj (16) = cos(ฮธi +ฮธj) sin(ฮธi +ฮธj) sin(ฮธi +ฮธj) cos(ฮธi +ฮธj) (17) = Q(ฮธi +ฮธj) (18) Following the same approach, we compute the inverse rotation: Q (ฮธi) Q(ฮธj) = Q( ฮธi) Q(ฮธj) = Q(ฮธj ฮธi) (19) Thus, instead of rotating the angular positions (expressed in rotation matrix form) using the rotation matrix Q, in practice we perform the transformation directly to the angles via addition/subtraction, and replace the matrix Qwith the identity matrix I1 1.


Roto-translated Local Coordinate Frames For Interacting Dynamical Systems

Neural Information Processing Systems

Modelling interactions is critical in learning complex dynamical systems, namely systems of interacting objects with highly non-linear and time-dependent behaviour. A large class of such systems can be formalized as geometric graphs, i.e., graphs with nodes positioned in the Euclidean space given an arbitrarily chosen global coordinate system, for instance vehicles in a traffic scene. Notwithstanding the arbitrary global coordinate system, the governing dynamics of the respective dynamical systems are invariant to rotations and translations, also known as Galilean invariance. As ignoring these invariances leads to worse generalization, in this work we propose local coordinate frames per node-object to induce roto-translation invariance to the geometric graph of the interacting dynamical system. Further, the local coordinate frames allow for a natural definition of anisotropic filtering in graph neural networks. Experiments in traffic scenes, 3D motion capture, and colliding particles demonstrate that the proposed approach comfortably outperforms the recent state-of-the-art.


Unsupervised learning of object frames by dense equivariant image labelling

Neural Information Processing Systems

One of the key challenges of visual perception is to extract abstract models of 3D objects and object categories from visual measurements, which are affected by complex nuisance factors such as viewpoint, occlusion, motion, and deformations. Starting from the recent idea of viewpoint factorization, we propose a new approach that, given a large number of images of an object and no other supervision, can extract a dense object-centric coordinate frame. This coordinate frame is invariant to deformations of the images and comes with a dense equivariant labelling neural network that can map image pixels to their corresponding object coordinates. We demonstrate the applicability of this method to simple articulated objects and deformable objects such as human faces, learning embeddings from random synthetic transformations or optical flow correspondences, all without any manual supervision.



Latent Field Discovery In Interacting Dynamical Systems With Neural Fields

Neural Information Processing Systems

Systems of interacting objects often evolve under the influence of field effects that govern their dynamics, yet previous works have abstracted away from such effects, and assume that systems evolve in a vacuum. In this work, we focus on discovering these fields, and infer them from the observed dynamics alone, without directly observing them.




Roto-translatedLocalCoordinateFrames ForInteractingDynamicalSystems

Neural Information Processing Systems

First,weintroduce canonicalized roto-translated local coordinate frames for interacting dynamical systems formalized in geometric graphs. Second, by operating solely on these coordinate frames, we enable roto-translation invariant edge prediction and roto-translation equivariant trajectory forecasting. Third, we present anovelmethodology for natural anisotropic continuous filters based onrelativelinear and angular positions ofneighboring objects in the canonicalized local coordinate frames.


Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation

arXiv.org Artificial Intelligence

Scalable multi-agent driving simulation requires behavior models that are both realistic and computationally efficient. We address this by optimizing the behavior model that controls individual traffic participants. To improve efficiency, we adopt an instance-centric scene representation, where each traffic participant and map element is modeled in its own local coordinate frame. This design enables efficient, viewpoint-invariant scene encoding and allows static map tokens to be reused across simulation steps. To model interactions, we employ a query-centric symmetric context encoder with relative positional encodings between local frames. We use Adversarial Inverse Reinforcement Learning to learn the behavior model and propose an adaptive reward transformation that automatically balances robustness and realism during training. Experiments demonstrate that our approach scales efficiently with the number of tokens, significantly reducing training and inference times, while outperforming several agent-centric baselines in terms of positional accuracy and robustness.


PPL: Point Cloud Supervised Proprioceptive Locomotion Reinforcement Learning for Legged Robots in Crawl Spaces

arXiv.org Artificial Intelligence

--Legged locomotion in constrained spaces (called crawl spaces) is challenging. In crawl spaces, current proprioceptive locomotion learning methods are difficult to achieve traverse because only ground features are inferred. In this study, a point cloud supervis ed RL framework for proprioceptive locomotion in crawl spaces is proposed . A state estimation network is designed to estimate the robot's collision states as well as ground and spatial features for locomotion . A point cloud feature extraction method is proposed to supervise the state estimation network . The method uses representation of the point cloud in polar coordinate frame and MLP s for efficient feature extracti on. Experiments demonstrate that, compared with existing methods, our method exhibits faster iteration time in the training and more agile locomotion in crawl spaces. This study enhances the ability of leg ged robots to traverse constrained spaces w ithout requiring exteroceptive sensors. N recent years, legged robots have demonstrated remarkable terrain traversal capabilities, exhibiting significant application value.